library(tidyverse)
library(knitr)
library(janitor)
library("readxl")
library(ggfortify)
library(GGally)
library(qtlcharts)
library(leaps)
library(sjPlot)
library(pheatmap)
library(partykit)
library(rpart)
library(caret)

1. Introduction

Excess weight, especially obesity, has become an epidemic in the 21st century and resulted in many significant health and economic consequences for the global population (Stein and Colditz, 2004). In Australia, this epidemic has also spread as 2 in 3 adults are classified as overweight or obese (Australian Institute of Health and Welfare). Researches has shown that this epidemic is more common in males than females and hence, BYU Human Performance Research Center has collected data from 250 men of various age and obtained estimates of the percentage of body fat through underwater weighing and various body circumference measurements (Rahman and Harding, 2013; DASL, n.d.). As body fat percentage is difficult to calculate in real life, the value for body fat percentage was derived from body density using the Siri’s 1956 equation.

1.1 Sampling method and potential biases

Few details were provided with regards to the sampling method. However, from looking at the dataset, there is a gender bias as the epidemic question is one related to both genders, yet only male were involved in the sample. This suggests that any analysis based on this dataset cannot be applied to the whole population but only the male population.

1.2 Data import, processing and cleaning

data = read.delim("bodyfat.txt") %>% janitor::clean_names()
data = data %>%
  mutate(bmi = (data$weight/(data$height ^ 2)) * 703,
         overweight = case_when(
          bmi >= 25 ~ 1,
          bmi < 25 ~ 0))
data$overweight = as.numeric(data$overweight)
#colnames(data)
data_bmi = data[-c(1:2,4:5,18)]
data_bf = data[-c(1,3:5,17:18)]
data_density = data[-c(2:5,17:18)]
data_overweight = data[-c(4:5,17)]
glimpse(data)
## Observations: 250
## Variables: 18
## $ density    <dbl> 1.0708, 1.0853, 1.0414, 1.0751, 1.0340, 1.0502, 1.05…
## $ pct_bf     <dbl> 12.3, 6.1, 25.3, 10.4, 28.7, 20.9, 19.2, 12.4, 4.1, …
## $ age        <int> 23, 22, 22, 26, 24, 24, 26, 25, 25, 23, 26, 27, 32, …
## $ weight     <dbl> 154.25, 173.25, 154.00, 184.75, 184.25, 210.25, 181.…
## $ height     <dbl> 67.75, 72.25, 66.25, 72.25, 71.25, 74.75, 69.75, 72.…
## $ neck       <dbl> 36.2, 38.5, 34.0, 37.4, 34.4, 39.0, 36.4, 37.8, 38.1…
## $ chest      <dbl> 93.1, 93.6, 95.8, 101.8, 97.3, 104.5, 105.1, 99.6, 1…
## $ abdomen    <dbl> 85.2, 83.0, 87.9, 86.4, 100.0, 94.4, 90.7, 88.5, 82.…
## $ waist      <dbl> 33.54331, 32.67717, 34.60630, 34.01575, 39.37008, 37…
## $ hip        <dbl> 94.5, 98.7, 99.2, 101.2, 101.9, 107.8, 100.3, 97.1, …
## $ thigh      <dbl> 59.0, 58.7, 59.6, 60.1, 63.2, 66.0, 58.4, 60.0, 62.9…
## $ knee       <dbl> 37.3, 37.3, 38.9, 37.3, 42.2, 42.0, 38.3, 39.4, 38.3…
## $ ankle      <dbl> 21.9, 23.4, 24.0, 22.8, 24.0, 25.6, 22.9, 23.2, 23.8…
## $ bicep      <dbl> 32.0, 30.5, 28.8, 32.4, 32.2, 35.7, 31.9, 30.5, 35.9…
## $ forearm    <dbl> 27.4, 28.9, 25.2, 29.4, 27.7, 30.6, 27.8, 29.0, 31.1…
## $ wrist      <dbl> 17.1, 18.2, 16.6, 18.2, 17.7, 18.8, 17.7, 18.8, 18.2…
## $ bmi        <dbl> 23.62446, 23.33205, 24.66632, 24.88078, 25.51485, 26…
## $ overweight <dbl> 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1…

1.3 Data Visualisation

1.3.1 Obesity and Age

data$overweight = as.character(data$overweight)
ggplot(data, aes(x=age,fill=overweight)) + geom_bar() +scale_fill_brewer(palette = "Paired") +labs(x='Age',y='Count', title = 'Severity of Obesity Across Different Age Groups',caption='Source: DASL Data') +theme(plot.title = element_text(hjust = 0.5, face = 'bold'))+theme(axis.text.x = element_text(angle = 50, hjust = 1))+scale_fill_discrete(name = "Overweight", labels = c("Not Overweight", "Overweight"))

data$overweight = as.numeric(data$overweight)

From looking at the graph, it can be observed that most of the sample comes from the age group between 40-60 and the number of people who appear to be overweight seems to be distributed evenly across the population.

1.3.2 Body Measurements

boxplot(data$neck,data$chest,data$abdomen,data$waist,data$bicep,data$forearm,data$wrist,main = "Upper Body Measurements",names = c("Neck","Chest","Abdomen","Waist","Bicep","Forearm","Wrist"))

boxplot(data$hip,data$knee,data$thigh,data$ankle,main = "Lower Body Measurements",names = c("Hip","Knee","Thigh","Ankle"))

From the two boxplots, the body measurement dataset appears relatively symmetrical and contains a few outliers. This may propose some problems in the later stages of the analysis.

1.4 Analysis Approach

This report aims to determine an alternative method to determine “overweight” individuals oppose to body fat percentage and the two alternative methods considered are:

  1. BMI
  2. Body Density

The analysis will first begin through determining how much variation in body fat percentage can be explained by simply body measurements and the number of measurements that is significant in building an accurate prediction model to examine the ease of calculation.

Then similar analysis will be conducted on BMI and Body Density where the end result will be compared together to determine which method can be explained the best using simple body measurements and offers easier and simpler interpretation.

Lastly, a binary indicator will be added to differentiate the sample into overweight individuals (1) and non-overweight individuals (0) using body-fat percentage as the guiding criteria. A logistic regression is run on the binary indicator with significant variables that we have identified throughout research in order to build a simpler model to determine the odds of an individual being obesed.


2. Analysis

Due to the increasing consumptions of fast food and the elevated convenience of food deliveries, concerns of obesity has risen throughout the world and has reached a new high. This has lead to an increasing need to measure obesity accurately and percentage body fat is arguably the most accurate measure by far. However, the calculation of body fat is difficult and many have switched to Body Mass Index (BMI) for its simpler calculation. This section is looking at how much variation that simple body measurements can explain in the three methods of interest - body fat percentage, BMI and body density.

2.1 Body Fat Percentage

2.1.1 Data Visualisation

qtlcharts::iplotCorr(data_bf)
<<<<<<< HEAD
=======
>>>>>>> eb02b408c5725340d071ce95dd26aa0caf0a559a

Based on the interactive correlation matrix above, it can be seen the level of correlation differs quite drastically between the variables and the backward variable selection method is adopted.

2.1.2 Multiple Regression and Variable Selection

bf_lm = lm(pct_bf~.,data=data_bf)
summary(bf_lm)
## 
## Call:
## lm(formula = pct_bf ~ ., data = data_bf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8684 -2.9088 -0.1904  3.0491 11.1421 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.20340    6.83392   0.322  0.74742    
## neck        -0.45612    0.23034  -1.980  0.04882 *  
## chest       -0.13005    0.09197  -1.414  0.15866    
## abdomen      1.03299    0.07638  13.524  < 2e-16 ***
## waist             NA         NA      NA       NA    
## hip         -0.33000    0.12768  -2.585  0.01034 *  
## thigh        0.08793    0.13395   0.656  0.51217    
## knee        -0.13537    0.22744  -0.595  0.55227    
## ankle        0.05505    0.21751   0.253  0.80041    
## bicep        0.17762    0.17029   1.043  0.29798    
## forearm      0.19468    0.20718   0.940  0.34834    
## wrist       -1.52499    0.50529  -3.018  0.00282 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.341 on 239 degrees of freedom
## Multiple R-squared:  0.737,  Adjusted R-squared:  0.726 
## F-statistic: 66.98 on 10 and 239 DF,  p-value: < 2.2e-16

Using the individual p-value method, the variables that need to be dropped are chest, waist, thigh, knee,ankle, bicep, forearm with ankle being the first to drop down due to its high p-value. However, to double check, the AIC criterion will also be considered.

bf_step_back = step(bf_lm, direction = "backward",trace = FALSE)
summary(bf_step_back)
## 
## Call:
## lm(formula = pct_bf ~ neck + chest + abdomen + hip + bicep + 
##     wrist, data = data_bf)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.668 -2.889 -0.361  3.210 11.148 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.52703    6.63727   0.230 0.818232    
## neck        -0.39650    0.22234  -1.783 0.075783 .  
## chest       -0.12810    0.08992  -1.425 0.155562    
## abdomen      1.01805    0.07431  13.700  < 2e-16 ***
## hip         -0.28758    0.09232  -3.115 0.002060 ** 
## bicep        0.26094    0.15160   1.721 0.086469 .  
## wrist       -1.55084    0.45510  -3.408 0.000767 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.32 on 243 degrees of freedom
## Multiple R-squared:  0.7353, Adjusted R-squared:  0.7287 
## F-statistic: 112.5 on 6 and 243 DF,  p-value: < 2.2e-16

Based on the backward selection model, the fitted model has become:

\[\hat{BodyFat} = 1.52 -0.3965Neck - 0.128Chest\] \[+ 1.01805Abdomen -0.28758Hip + 0.26Vicep -1.55084Wrist\]

2.1.3 Check Assumptions

par(mfrow=c(1,2))
plot(bf_step_back,which=1:2) + theme_bw()

## NULL

The QQ plot shows a straight line which indicates that the normality assumption is reasonable. However, the residuals vs fitted plot shows a slight variation; but given that body fat is hard to predict, this is acceptable.

2.1.4 Final fitted model

relbf <- function(fit,...){
  R <- cor(fit$model)
  nvar <- ncol(R)
  rxx <- R[2:nvar, 2:nvar]
  rxy <- R[2:nvar, 1]
  svd <- eigen(rxx)
  evec <- svd$vectors
  ev <- svd$values
  delta <- diag(sqrt(ev))
  lambda <- evec %*% delta %*% t(evec)
  lambdasq <- lambda ^ 2
  beta <- solve(lambda) %*% rxy
  rsquare <- colSums(beta ^ 2)
  rawwgt <- lambdasq %*% beta ^ 2
  import <- (rawwgt / rsquare) * 100
  import <- as.data.frame(import)
  row.names(import) <- names(fit$model[2:nvar])
  names(import) <- "Weights"
  import <- import[order(import),1, drop=FALSE]
  dotchart(import$Weights, labels=row.names(import),
           xlab="% of R-Square", pch=19,
           main="Relative Importance of Predictor Variables",
           sub=paste("Total R-Square=", round(rsquare, digits=3)),
           ...)
  return(import)
}
relbf(bf_step_back, col="blue")

##           Weights
## wrist    4.038431
## bicep    7.746760
## neck     8.238378
## hip     16.313788
## chest   21.795458
## abdomen 41.867184

The final model is:

\[ \hat{body-fat} = 1.52 -0.3965neck - 0.128chest + 1.01805abdomen -0.28758hip + 0.26bicep -1.55084wrist \]

and abdomen is relatively the most important predictor for predicting body fat percentage.

Looking at the \(R^2\) value (multiple R-squared) from the summary output, 73.5% of the variability of density is explained by the regression on percentage of Height, Neck, Chest, Abdomen.

  1. On average, holding the other variables constant, a 1 unit increase in Neck leads to a 0.3965 unit decrease in Body Fat Percentage
  2. On average, holding the other variables constant, a 1 unit increase in Chest leads to a 0.128 unit decrease in Body Fat Percentage
  3. On average, holding the other variables constant, a 1 unit increase in Abdomen leads to a 1.01805 unit increase in Body Fat Percentage
  4. On average, holding the other variables constant, a 1 unit increase in Hip leads to a 0.28758 unit decrease in Body Fat Percentage
  5. On average, holding the other variables constant, a 1 unit increase in Bicep leads to a 0.26 unit increase in Body Fat Percentage
  6. On average, holding the other variables constant, a 1 unit increase in Wrist leads to a 1.5508 unit decrease in Body Fat Percentage

2.2 BMI

For this analysis, the formula of BMI is

\[ BMI = \frac{Weight (lbs)*703}{Height(in)^2} \]

2.2.1 Defining the model with population parameters

\[ BMI = \beta_0 + \beta_1Neck + \beta_2Chest \\ + \beta_3Abdomen + \beta_4Waist + \beta_5Hip + \beta_6Thigh + \beta_7Knee + \beta_8Ankle \\+ \beta_9Bicep + \beta_{10}Forearm + \beta_{11}Wrist + \epsilon \]

qtlcharts::iplotCorr(data_bmi)
<<<<<<< HEAD
=======
>>>>>>> eb02b408c5725340d071ce95dd26aa0caf0a559a

Based on the interactive correlation matrix, it can be seen the level of correlation differs quite drastically between the variables and the backward variable selection method is also adopted here.

2.2.2 Multiple Regression with Variable Selection

bmi_lm = lm(bmi~.,data=data_bmi)
summary(bmi_lm)
## 
## Call:
## lm(formula = bmi ~ ., data = data_bmi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1790 -0.6548  0.0086  0.6881  3.8335 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -11.938661   1.725613  -6.919 4.19e-11 ***
## age           0.010049   0.007594   1.323   0.1870    
## neck          0.031474   0.056108   0.561   0.5754    
## chest         0.149956   0.022420   6.688 1.59e-10 ***
## abdomen       0.118223   0.020898   5.657 4.40e-08 ***
## waist               NA         NA      NA       NA    
## hip           0.060113   0.032231   1.865   0.0634 .  
## thigh         0.151883   0.034888   4.353 1.99e-05 ***
## knee         -0.265374   0.056116  -4.729 3.87e-06 ***
## ankle         0.067577   0.053692   1.259   0.2094    
## bicep         0.049712   0.041497   1.198   0.2321    
## forearm       0.086789   0.051015   1.701   0.0902 .  
## wrist        -0.045825   0.129082  -0.355   0.7229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.058 on 238 degrees of freedom
## Multiple R-squared:  0.9046, Adjusted R-squared:  0.9002 
## F-statistic: 205.1 on 11 and 238 DF,  p-value: < 2.2e-16

Using the individual p-value method, the variables that need to be dropped are hip, ankle, bicep, forearm and wrist. To double check, the AIC criterion will also be considered.

bmi_step_back = step(bmi_lm, direction = "backward",trace = FALSE)
summary(bmi_step_back)
## 
## Call:
## lm(formula = bmi ~ chest + abdomen + hip + thigh + knee + forearm, 
##     data = data_bmi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1197 -0.6944 -0.0274  0.6831  3.8464 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.94257    1.43829  -7.608 6.10e-13 ***
## chest         0.16090    0.02122   7.582 7.18e-13 ***
## abdomen       0.12726    0.01826   6.968 3.01e-11 ***
## hip           0.05047    0.03084   1.637   0.1030    
## thigh         0.14983    0.03032   4.942 1.44e-06 ***
## knee         -0.23116    0.05148  -4.490 1.10e-05 ***
## forearm       0.11484    0.04468   2.571   0.0108 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.058 on 243 degrees of freedom
## Multiple R-squared:  0.9024, Adjusted R-squared:    0.9 
## F-statistic: 374.6 on 6 and 243 DF,  p-value: < 2.2e-16

Based on the backward selection model, the fitted model has become:

\[\hat{BMI} = -10.94 +0.161Chest + 0.127Abdomen + 0.050Hip\] \[+ 0.150Thigh - 0.23Knee + 0.115Forearm \]

2.2.3 Check Assumptions

par(mfrow=c(1,2))
plot(bmi_step_back,which=1:2) + theme_bw()

## NULL

The QQ plot shows a straight line which indicates that the normality assumption is reasonable. However, the residuals vs fitted plot shows a fan shaped plot which indicates that the assumption of homogeneous variance is violated. We can use a log transformed response and re-fit the linear regression.

The new model will become: $log() = 1.83 +0.0058chest + 0.0052abdomen + 0.0064 thigh -0.0065knee + 0.0028bicep + 0.0040 forearm $.

ln_bmi_lm = lm(log(bmi)~.,data=data_bmi)
summary(ln_bmi_lm)
## 
## Call:
## lm(formula = log(bmi) ~ ., data = data_bmi)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.133137 -0.024215 -0.000443  0.027919  0.102670 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.7512060  0.0654240  26.767  < 2e-16 ***
## age          0.0005158  0.0002879   1.791 0.074515 .  
## neck         0.0017502  0.0021272   0.823 0.411459    
## chest        0.0055041  0.0008500   6.475 5.37e-10 ***
## abdomen      0.0044728  0.0007923   5.645 4.68e-08 ***
## waist               NA         NA      NA       NA    
## hip          0.0010150  0.0012220   0.831 0.407026    
## thigh        0.0069490  0.0013227   5.253 3.31e-07 ***
## knee        -0.0082150  0.0021276  -3.861 0.000145 ***
## ankle        0.0026638  0.0020357   1.309 0.191940    
## bicep        0.0024437  0.0015733   1.553 0.121689    
## forearm      0.0038084  0.0019341   1.969 0.050112 .  
## wrist       -0.0017188  0.0048940  -0.351 0.725745    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04009 on 238 degrees of freedom
## Multiple R-squared:  0.9079, Adjusted R-squared:  0.9036 
## F-statistic: 213.2 on 11 and 238 DF,  p-value: < 2.2e-16
ln_bmi_step_back = step(ln_bmi_lm, direction = "backward",trace = FALSE)
summary(ln_bmi_step_back)
## 
## Call:
## lm(formula = log(bmi) ~ age + chest + abdomen + thigh + knee + 
##     bicep + forearm, data = data_bmi)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.132191 -0.023020 -0.000321  0.027716  0.106640 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.7989069  0.0524420  34.303  < 2e-16 ***
## age          0.0004265  0.0002629   1.622 0.106083    
## chest        0.0058017  0.0008196   7.079 1.57e-11 ***
## abdomen      0.0047155  0.0007078   6.662 1.80e-10 ***
## thigh        0.0075055  0.0011966   6.273 1.62e-09 ***
## knee        -0.0069254  0.0018984  -3.648 0.000324 ***
## bicep        0.0026331  0.0015455   1.704 0.089720 .  
## forearm      0.0042579  0.0018372   2.318 0.021306 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04002 on 242 degrees of freedom
## Multiple R-squared:  0.9067, Adjusted R-squared:  0.904 
## F-statistic: 335.8 on 7 and 242 DF,  p-value: < 2.2e-16
par(mfrow=c(1,2))
plot(ln_bmi_step_back,which=1:2) + theme_bw()

## NULL
sjPlot::tab_model(bmi_step_back, ln_bmi_step_back, digits = 5, show.ci = FALSE)
  bmi log(bmi)
Predictors Estimates p Estimates p
(Intercept) -10.94257 <0.001 1.79891 <0.001
chest 0.16090 <0.001 0.00580 <0.001
abdomen 0.12726 <0.001 0.00472 <0.001
hip 0.05047 0.103
thigh 0.14983 <0.001 0.00751 <0.001
knee -0.23116 <0.001 -0.00693 <0.001
forearm 0.11484 0.011 0.00426 0.021
age 0.00043 0.106
bicep 0.00263 0.090
Observations 250 250
R2 / R2 adjusted 0.902 / 0.900 0.907 / 0.904

However, although the transformation has aided with the homogeneous variance assumption, the interpretation itself does not make much sense - BMI is determined by an increase in value, not the increase in percentage change. Hence in the final comparison, we will use the untransformed model.

2.2.4 Final Fitted Model

relbmi <- function(fit,...){
  R <- cor(fit$model)
  nvar <- ncol(R)
  rxx <- R[2:nvar, 2:nvar]
  rxy <- R[2:nvar, 1]
  svd <- eigen(rxx)
  evec <- svd$vectors
  ev <- svd$values
  delta <- diag(sqrt(ev))
  lambda <- evec %*% delta %*% t(evec)
  lambdasq <- lambda ^ 2
  beta <- solve(lambda) %*% rxy
  rsquare <- colSums(beta ^ 2)
  rawwgt <- lambdasq %*% beta ^ 2
  import <- (rawwgt / rsquare) * 100
  import <- as.data.frame(import)
  row.names(import) <- names(fit$model[2:nvar])
  names(import) <- "Weights"
  import <- import[order(import),1, drop=FALSE]
  dotchart(import$Weights, labels=row.names(import),
           xlab="% of R-Square", pch=19,
           main="Relative Importance of Predictor Variables",
           sub=paste("Total R-Square=", round(rsquare, digits=3)),
           ...)
  return(import)
}
relbmi(bmi_step_back, col="blue")

##           Weights
## forearm  9.021850
## knee     9.080285
## thigh   15.075099
## hip     17.245099
## abdomen 24.705834
## chest   24.871833

The final model is:

\[\hat{BMI} = -10.94 +0.161Chest + 0.127Abdomen + 0.050Hip\] \[+ 0.150Thigh - 0.23Knee + 0.115Forearm\] and both chest and abdomen are relatively more important in predicting BMI.

Looking at the \(R^2\) value (multiple R-squared) from the summary output, 90.2% of the variability of density is explained by the regression on percentage of Height, Neck, Chest, Abdomen.
  1. On average, holding the other variables constant, a 1 unit increase in Chest leads to a 0.161 unit increase in BMI
  2. On average, holding the other variables constant, a 1 unit increase in Abdomen leads to a 0.127 unit increase in BMI
  3. On average, holding the other variables constant, a 1 unit increase in Hip leads to a 0.050 unit increase in BMI
  4. On average, holding the other variables constant, a 1 unit increase in Thigh leads to a 0.15 unit increase in BMI
  5. On average, holding the other variables constant, a 1 unit increase in Knee leads to a 0.23 unit decrease in BMI
  6. On average, holding the other variables constant, a 1 unit increase in Forearm leads to a 0.115 unit decrease in BMI

2.3 Body Density

2.3.1 Defining the model with population parameters

\[ Body Density = \beta_0 + \beta_1Pcf.BF + \beta_2Age + \beta_3Weight + \beta_4Height\\ + \beta_5Neck + \beta_6Chest + \beta_7Abdomen + \beta_8Waist + \beta_9Hip + \beta_{10}Thigh\\ + \beta_{11}Knee + \beta_{12}Ankle + \beta_{13}Bicep + \beta_{14}Forearm + \beta_{15}Wrist + \epsilon \]

cor_matrix <- cor(data_density)
pheatmap(cor_matrix, display_numbers = T,na.rm=T)

Above matrix has shown the interactice correlation between variables. Notbaly, Pct.BF has a -0.99 relationship with Density, which means Pct.BF could be used to explain Density. Meanwhile, variables having similar properties are linked together, which could be useful for generating groups.

2.3.2 Check Assumptions

The residuals \(\epsilon_i\) are iid \(N(0,\sigma^2)\) and there is a linear relationship between y and x.

M0 <- lm(density ~ 1, data = data_density)  # Null model
M1 <- lm(density ~ ., data = data_density)  # Full model
autoplot(M1,which=1:2)+theme_bw()

round(summary(M1)$coef, 3)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    1.082      0.016  68.068    0.000
## neck           0.001      0.001   2.074    0.039
## chest          0.000      0.000   1.818    0.070
## abdomen       -0.002      0.000 -13.645    0.000
## hip            0.001      0.000   2.950    0.003
## thigh          0.000      0.000  -0.980    0.328
## knee           0.000      0.001   0.689    0.492
## ankle          0.000      0.001  -0.675    0.500
## bicep         -0.001      0.000  -1.357    0.176
## forearm        0.000      0.000  -0.947    0.345
## wrist          0.004      0.001   3.310    0.001
step.fwd.aic <- step(M0, scope = list(lower = M0, upper = M1), direction = "forward", trace = FALSE)
summary(step.fwd.aic)
## 
## Call:
## lm(formula = density ~ waist + wrist + hip + chest + bicep + 
##     neck, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.024142 -0.007680  0.000523  0.006156  0.038390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.0844660  0.0154768  70.070  < 2e-16 ***
## waist       -0.0060402  0.0004401 -13.724  < 2e-16 ***
## wrist        0.0038812  0.0010612   3.657 0.000312 ***
## hip          0.0006990  0.0002153   3.247 0.001331 ** 
## chest        0.0003881  0.0002097   1.851 0.065427 .  
## bicep       -0.0007779  0.0003535  -2.201 0.028695 *  
## neck         0.0009609  0.0005185   1.853 0.065030 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01007 on 243 degrees of freedom
## Multiple R-squared:  0.722,  Adjusted R-squared:  0.7152 
## F-statistic: 105.2 on 6 and 243 DF,  p-value: < 2.2e-16
step.back.aic <- step(M1, scope = list(lower = M0, upper = M1), direction = "backward", trace = FALSE)
summary(step.back.aic)
## 
## Call:
## lm(formula = density ~ neck + chest + abdomen + hip + bicep + 
##     wrist, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.024142 -0.007680  0.000523  0.006156  0.038390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.0844660  0.0154768  70.070  < 2e-16 ***
## neck         0.0009609  0.0005185   1.853 0.065030 .  
## chest        0.0003881  0.0002097   1.851 0.065428 .  
## abdomen     -0.0023780  0.0001733 -13.724  < 2e-16 ***
## hip          0.0006990  0.0002153   3.247 0.001331 ** 
## bicep       -0.0007779  0.0003535  -2.201 0.028694 *  
## wrist        0.0038812  0.0010612   3.657 0.000312 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01007 on 243 degrees of freedom
## Multiple R-squared:  0.722,  Adjusted R-squared:  0.7152 
## F-statistic: 105.2 on 6 and 243 DF,  p-value: < 2.2e-16
exh <- regsubsets(density~., data = data_density, nvmax = 15)
## Warning in leaps.exhaustive(a, really.big): XHAUST returned error code -999
plot(exh,scale="bic")

2.3.3 Multiple Regression using the BIC

M2<- lm(formula = density ~ neck + chest + abdomen, 
    data = data_density)
summary(M2)
## 
## Call:
## lm(formula = density ~ neck + chest + abdomen, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.029395 -0.007156 -0.000682  0.007305  0.046687 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1392164  0.0119499  95.333  < 2e-16 ***
## neck         0.0017212  0.0004599   3.743 0.000226 ***
## chest        0.0004569  0.0002135   2.140 0.033361 *  
## abdomen     -0.0021095  0.0001592 -13.250  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01057 on 246 degrees of freedom
## Multiple R-squared:  0.6904, Adjusted R-squared:  0.6867 
## F-statistic: 182.9 on 3 and 246 DF,  p-value: < 2.2e-16
M3<- lm(formula = density ~ neck + chest + abdomen + waist , 
    data = data_density)
summary(M3)
## 
## Call:
## lm(formula = density ~ neck + chest + abdomen + waist, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.029395 -0.007156 -0.000682  0.007305  0.046687 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1392164  0.0119499  95.333  < 2e-16 ***
## neck         0.0017212  0.0004599   3.743 0.000226 ***
## chest        0.0004569  0.0002135   2.140 0.033361 *  
## abdomen     -0.0021095  0.0001592 -13.250  < 2e-16 ***
## waist               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01057 on 246 degrees of freedom
## Multiple R-squared:  0.6904, Adjusted R-squared:  0.6867 
## F-statistic: 182.9 on 3 and 246 DF,  p-value: < 2.2e-16

Drop waist and add other variables

M4<- lm(formula = density ~ neck + chest + abdomen + hip , 
    data = data_density)
summary(M4)
## 
## Call:
## lm(formula = density ~ neck + chest + abdomen + hip, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.028989 -0.007256  0.000047  0.006767  0.045116 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1150223  0.0139695  79.818  < 2e-16 ***
## neck         0.0014682  0.0004584   3.203  0.00154 ** 
## chest        0.0003734  0.0002113   1.768  0.07837 .  
## abdomen     -0.0023671  0.0001759 -13.455  < 2e-16 ***
## hip          0.0006619  0.0002074   3.191  0.00160 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01037 on 245 degrees of freedom
## Multiple R-squared:  0.7028, Adjusted R-squared:  0.6979 
## F-statistic: 144.8 on 4 and 245 DF,  p-value: < 2.2e-16
M5<- lm(formula = density ~ neck + chest + abdomen + hip + thigh , 
    data = data_density)
summary(M5)
## 
## Call:
## lm(formula = density ~ neck + chest + abdomen + hip + thigh, 
##     data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.029545 -0.006988  0.000516  0.007098  0.043367 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1073577  0.0144127  76.832  < 2e-16 ***
## neck         0.0016460  0.0004644   3.544 0.000472 ***
## chest        0.0003393  0.0002107   1.610 0.108626    
## abdomen     -0.0023869  0.0001752 -13.627  < 2e-16 ***
## hip          0.0010639  0.0002889   3.682 0.000285 ***
## thigh       -0.0005716  0.0002878  -1.986 0.048171 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01031 on 244 degrees of freedom
## Multiple R-squared:  0.7075, Adjusted R-squared:  0.7015 
## F-statistic:   118 on 5 and 244 DF,  p-value: < 2.2e-16

Drop chest and add other variables

M6<- lm(formula = density ~ neck + abdomen + hip + thigh + knee , 
    data = data_density)
summary(M6)
## 
## Call:
## lm(formula = density ~ neck + abdomen + hip + thigh + knee, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.029361 -0.007760  0.000360  0.007152  0.043816 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1035095  0.0150104  73.517  < 2e-16 ***
## neck         0.0018149  0.0004395   4.129    5e-05 ***
## abdomen     -0.0022098  0.0001347 -16.402  < 2e-16 ***
## hip          0.0010035  0.0002984   3.363 0.000894 ***
## thigh       -0.0007038  0.0002938  -2.395 0.017355 *  
## knee         0.0007551  0.0005006   1.508 0.132778    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01032 on 244 degrees of freedom
## Multiple R-squared:  0.7071, Adjusted R-squared:  0.7011 
## F-statistic: 117.8 on 5 and 244 DF,  p-value: < 2.2e-16

Drop knee

M7<- lm(formula = density ~ neck + abdomen + hip + thigh , 
    data = data_density)
summary(M7)
## 
## Call:
## lm(formula = density ~ neck + abdomen + hip + thigh, data = data_density)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.030204 -0.007301  0.000653  0.007199  0.045107 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.1104052  0.0143343  77.465  < 2e-16 ***
## neck         0.0019085  0.0004362   4.375  1.8e-05 ***
## abdomen     -0.0022064  0.0001351 -16.337  < 2e-16 ***
## hip          0.0011314  0.0002868   3.945 0.000104 ***
## thigh       -0.0006094  0.0002878  -2.117 0.035236 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01035 on 245 degrees of freedom
## Multiple R-squared:  0.7044, Adjusted R-squared:  0.6996 
## F-statistic:   146 on 4 and 245 DF,  p-value: < 2.2e-16

2.3.4 Final Fitted Model

relweights <- function(fit,...){
  R <- cor(fit$model)
  nvar <- ncol(R)
  rxx <- R[2:nvar, 2:nvar]
  rxy <- R[2:nvar, 1]
  svd <- eigen(rxx)
  evec <- svd$vectors
  ev <- svd$values
  delta <- diag(sqrt(ev))
  lambda <- evec %*% delta %*% t(evec)
  lambdasq <- lambda ^ 2
  beta <- solve(lambda) %*% rxy
  rsquare <- colSums(beta ^ 2)
  rawwgt <- lambdasq %*% beta ^ 2
  import <- (rawwgt / rsquare) * 100
  import <- as.data.frame(import)
  row.names(import) <- names(fit$model[2:nvar])
  names(import) <- "Weights"
  import <- import[order(import),1, drop=FALSE]
  dotchart(import$Weights, labels=row.names(import),
           xlab="% of R-Square", pch=19,
           main="Relative Importance of Predictor Variables",
           sub=paste("Total R-Square=", round(rsquare, digits=3)),
           ...)
  return(import)
}
relweights(M7, col="blue")

##          Weights
## neck    11.06366
## thigh   13.82063
## hip     19.73563
## abdomen 55.38008

Obviously, abdomen contributes the most in the relationship with body density.

The final model is: \[ Body Density = 1.1104052 + 0.0019085 \times Neck\\ - 0.0022064 \times Abdomen\ + 0.0011314 \times Hip\\ - 0.0006094 \times Thigh\\ \] Looking at the \(R^2\) value (multiple R-squared) from the summary output, 70.4% of the variability of density is explained by the regression on percentage of Height, Neck, Chest, Abdomen.
  1. On average, holding the other variables constant, a 1% increase in Neck leads to a 0.2% unit increase in Density
  2. On average, holding the other variables constant, a 1% increase in Abdomen leads to a 0.2% decrease in Density
  3. On average, holding the other variables constant, a 1% increase in Hip leads to a 0.1% increase in Density
  4. On average, holding the other variables constant, a 1% increase in Thigh leads to a 0.06% decrease in Density

2.3.5 Linear regression assumptions for the stepwise model

autoplot(M7,which=1:2)+theme_bw()


2.4 Predicting Overweight

Which variables can best predict whether a person is overweight?

Since the variable ‘Overweight’ is dichotomous (binary), we perform a logistical regression to interpret the relationship between overweight and other significant variables.

<<<<<<< HEAD
2.4.1 Checking for Significance in a Logistic Regression
data_overweight = data[-c(4:5,17)]
data_overweight$overweight = as.numeric(data_overweight$overweight)
glm1 = glm(overweight ~ ., data = data_overweight)
# drop knee
glm2 = glm(overweight ~ density + pct_bf + age + neck + chest + abdomen + waist + hip + thigh + ankle + bicep + forearm + wrist, data = data_overweight)
# drop ankle
glm3 = glm(overweight ~ density + pct_bf + age + neck + chest + abdomen + waist + hip + thigh + bicep + forearm + wrist, data = data_overweight)
# drop density
glm4 = glm(overweight ~ pct_bf + age + neck + chest + abdomen + waist + hip + thigh + bicep + forearm + wrist, data = data_overweight)
# drop age
glm5 = glm(overweight ~ pct_bf + neck + chest + abdomen + waist + hip + thigh + bicep + forearm + wrist, data = data_overweight)
# drop waist
glm6 = glm(overweight ~ pct_bf + neck + chest + abdomen + hip + thigh + bicep + forearm + wrist, data = data_overweight)

glm1 = glm(overweight ~ ., family = binomial, data = data_overweight)
=======

2.4.1 Checking for Significance in a Logistic Regression

glm1 = glm(overweight ~ ., family = binomial, data = data_overweight)
>>>>>>> eb02b408c5725340d071ce95dd26aa0caf0a559a
# drop hip
glm2 = glm(overweight ~ density + pct_bf + age + neck + chest + abdomen + waist + knee + thigh + ankle + bicep + forearm + wrist, family = binomial, data = data_overweight)

# drop neck
glm3 = glm(overweight ~ density + pct_bf + age + chest + abdomen + waist + knee + thigh + ankle + bicep + forearm + wrist, family = binomial, data = data_overweight)
# drop forearm
glm4 = glm(overweight ~ density + pct_bf + age + chest + abdomen + waist + knee + thigh + ankle + bicep + wrist, family = binomial, data = data_overweight)
# drop pct_bf
glm5 = glm(overweight ~ density + age + chest + abdomen + waist + knee + thigh + ankle + bicep + wrist, family = binomial, data = data_overweight)
# drop wrist
glm6 = glm(overweight ~ density + age + chest + abdomen + waist + knee + thigh + ankle + bicep, family = binomial, data = data_overweight)
# drop age
glm7 = glm(overweight ~ density + chest + abdomen + waist + knee + thigh + ankle + bicep, family = binomial, data = data_overweight)
# drop knee
glm8 = glm(overweight ~ density + chest + abdomen + waist + thigh + ankle + bicep, family = binomial, data = data_overweight)
# drop ankle
glm9 = glm(overweight ~ density + chest + abdomen + waist + thigh + bicep, family = binomial, data = data_overweight)
# drop waist
glm10 = glm(overweight ~ density + chest + abdomen + thigh + bicep, family = binomial, data = data_overweight)
# drop density
glm11 = glm(overweight ~ chest + abdomen + thigh + bicep, family = binomial, data = data_overweight)
# drop bicep
glm12 = glm(overweight ~ chest + abdomen + thigh, family = binomial, data = data_overweight)
summary(glm12)
## 
## Call:
## glm(formula = overweight ~ chest + abdomen + thigh, family = binomial, 
##     data = data_overweight)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.37220  -0.17575  -0.00018   0.15018   2.29089  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -75.89380   11.89377  -6.381 1.76e-10 ***
## chest         0.37854    0.09686   3.908 9.30e-05 ***
## abdomen       0.27128    0.07243   3.746  0.00018 ***
## thigh         0.22381    0.09638   2.322  0.02022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 346.574  on 249  degrees of freedom
## Residual deviance:  98.505  on 246  degrees of freedom
## AIC: 106.5
## 
## Number of Fisher Scoring iterations: 8

Before we start making predictions with the model, we drop the variables which are not a significant predictor for being overweight. The fitted model is shown below.

2.4.2 Fitted Model (log odds scale)

\[ logit(p) = log(\frac{p}{1-p}) = -75.89380 + 0.37854Chest\\ + 0.27128Abdomen + 0.22381Thigh\\ \] where the logit(p) is a special link from our linear combination of predictors to the probability of the outcome being equal to 1, and the coefficients are interpreted as changes in log-odds.

2.4.3 Visualising Coefficients (odds scale) and Predictions

summary(glm12)
## 
## Call:
## glm(formula = overweight ~ chest + abdomen + thigh, family = binomial, 
##     data = data_overweight)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.37220  -0.17575  -0.00018   0.15018   2.29089  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -75.89380   11.89377  -6.381 1.76e-10 ***
## chest         0.37854    0.09686   3.908 9.30e-05 ***
## abdomen       0.27128    0.07243   3.746  0.00018 ***
## thigh         0.22381    0.09638   2.322  0.02022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 346.574  on 249  degrees of freedom
## Residual deviance:  98.505  on 246  degrees of freedom
## AIC: 106.5
## 
## Number of Fisher Scoring iterations: 8
#tab_model(glm0, transform = NULL)
plot_model(glm12) + theme_bw(base_size = 10) + ylim(1, 2) + labs(x = "Overweight", y = "Odds Ratios", title = "Model Coefficients using odds scale",caption='Source: SOCR Data')

plot_model(glm12, type = "pred", terms = c("abdomen", "chest", "thigh"), show.data = TRUE) + theme_bw(base_size = 10) + labs(caption='Source: SOCR Data')

In the Coefficient graph, the three significant variables have similar odd ratios giving the model a smaller confidence interval.

<<<<<<< HEAD

From the Prediction graph, the positive slopes indicate a larger abdomen circumference leads to a high probability of being overweight. Comparing the three individual graphs, the steeper slope indicates bicep circumference correlates with odds of being overweight. ##### 2.4.4 Evaluating (in-sample) performance We correctly classified 92% of the observations, hence our resubstitution error rate, proportion of data predicted incorrectly using the fitted model, is 8.0%.

=======

From the Prediction graph, the positive slopes indicate a larger abdomen circumference leads to a high probability of being overweight. Comparing the three individual graphs, the steeper slope indicates bicep circumference correlates with odds of being overweight.

2.4.4 Evaluating (in-sample) performance

We correctly classified 92% of the observations, hence our resubstitution error rate, proportion of data predicted incorrectly using the fitted model, is 8.0%.

>>>>>>> eb02b408c5725340d071ce95dd26aa0caf0a559a
glm0 = glm(overweight ~ chest + abdomen + thigh, family = binomial, data = data)
data = data %>% 
  mutate(pred_prob = predict(glm0, type = "response"),
         pred_surv = round(pred_prob))

mean(data$overweight == data$pred_surv)
## [1] 0.92
confusion.glm = confusionMatrix(
  data = as.factor(data$pred_surv),
  reference = as.factor(data$overweight))
#confusion.glm$table
confusion.glm
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 117  12
##          1   8 113
##                                           
##                Accuracy : 0.92            
##                  95% CI : (0.8791, 0.9505)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.84            
##                                           
##  Mcnemar's Test P-Value : 0.5023          
##                                           
##             Sensitivity : 0.9360          
##             Specificity : 0.9040          
##          Pos Pred Value : 0.9070          
##          Neg Pred Value : 0.9339          
##              Prevalence : 0.5000          
##          Detection Rate : 0.4680          
##    Detection Prevalence : 0.5160          
##       Balanced Accuracy : 0.9200          
##                                           
##        'Positive' Class : 0               
## 

Based on the matrix above, out of the 117 + 8 not overweight people, the model successfully predicts 117 of them. Out of the 113 + 12 overweight people, the model successfully predicts 113 of them.

The odds of being overweight for someone with an above average abdomen circumference of 110 is 5.31.

predict_overweight = data.frame(abdomen = 110, chest = mean(data$chest), thigh = mean(data$thigh))
predict(glm12, newdata = predict_overweight, type = "link")
##        1 
## 5.312144

The odds of being overweight for someone with a below average abdomen circumference of 65 is -6.90.

predict_overweight = data.frame(abdomen = 65, chest = mean(data$chest), thigh = mean(data$thigh))
predict(glm12, newdata = predict_overweight, type = "link")
##         1 
## -6.895542

The odds of being overweight for someone with slightly above average circumferences for their abdomen, chest, and bicep is 8.146941.

predict_overweight = data.frame(abdomen = mean(data$abdomen)*1.1, chest = mean(data$chest)*1.1, thigh = mean(data$thigh)*1.1)
predict(glm12, newdata = predict_overweight, type = "link")
##        1 
## 8.146941

2.4.5 Decision Tree

data$overweight=as.character(data$overweight)
ov_tree = rpart(overweight ~ abdomen + chest + thigh, data = data, method = "class",control = rpart.control(cp = 0.008))
ov_tree
## n= 250 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 250 125 0 (0.50000000 0.50000000)  
##   2) chest< 101.55 144  23 0 (0.84027778 0.15972222)  
##     4) abdomen< 92.6 126  10 0 (0.92063492 0.07936508) *
##     5) abdomen>=92.6 18   5 1 (0.27777778 0.72222222) *
##   3) chest>=101.55 106   4 1 (0.03773585 0.96226415) *
plot(as.party(ov_tree))

Out of a total of 250 observations, 96% of people with a chest circumference greater or equal to 101.55cm are overweight. Also, of the people with a chest circumference less than 101.55cm, those with an abdomen circumference greater or equal to 92.6cm are 72% likely to be overweight. On the other hand, those with an abdomen circumference less than 92.6cm are 92% likely to be NOT overweight.

3. Limitations

  1. Privacy Issues
  2. Age Range
  3. hist(data$age, freq=FALSE, col="aliceblue", xlab="Age", main="Age Range Histogram")
    curve(dnorm(x, mean=mean(data$age), sd=sd(data$age)), add=TRUE, col="red") #line
  4. Multicollilinearity

4. Conclusion

5. References

  1. Australian Institute of Health and Welfare (AIHW). (2019). Overweight & obesity. Australian Government. Retrieved from https://www.aihw.gov.au/reports-data/behaviours-risk-factors/overweight-obesity/overview
  2. DASL. (n.d.). Bodyfat. DASL. Retrieved from < https://dasl.datadescription.com/datafile/bodyfat>
  3. Rahman, A., & Harding, A. (2013). Prevalence of overweight and obesity epidemic in Australia: some causes and consequences. JP Journal of Biostatistics, 10(1), 31-48.
  4. Stein, C. J., & Colditz, G. A. (2004). The epidemic of obesity. The Journal of Clinical Endocrinology & Metabolism, 89(6), 2522-2525.